On the Optical Character Recognition and Machine Translation Technology in Arabic: Problems and Solutions
نویسندگان
چکیده
The report addresses the basic problems of the Arabic language formalization based on analysis of linguistic errors in software products. Reviewing the principles of modern information systems operation the authors come to the conclusion that the existing methods of the Arabic formalization allow to note a shift towards the technological aspects of the linguistic processing of facts, however, the quality of applied linguistic components still remains poor. Possibilities for the application of traditional recognition algorithms for Arabic are still uncertain in spite of a significant number of theoretical and practical results in the field of computational linguistics. There are several problems which are due to be solved in relation to the processing of the Arabic text. These issues may be divided into those related to Optical Recognition of the Written Text (OCR), Word Processing (WP) and building of the content of the dictionaries, Machine Translation (MT).
منابع مشابه
A Finite State Model for Urdu Nastalique Optical Character Recognition
Finite state technology is being used since long to model NLP (Natural Language Processing) applications specially it has very successfully applied to machine translation and speech recognition systems. Character recognition in cursive scripts or handwritten Latin script also have attracted researchers’ attention and some research is also done in this area. Optical character recognition is the ...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملOCR and Automated Translation for the Navigation of non-English Handsets: A Feasibility Study with Arabic
In forensics, mobile phones or handsets store potentially valuable information such as Contact lists, SMS Messages, or possibly emails and Calendar appointments. However, navigating to this content on non-English configured handsets, when the operator is untrained in the language, becomes a difficult task. We discuss a feasibility study that explored the performance of optical character recogni...
متن کاملتشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کامل